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Abstract. The large scale distribution of galaxies in the universe displays a complex 
pattern of clusters, super-clusters, filaments and voids with sizes limited only by the 
boundaries of the available samples. A quantitative statistical characterization of these 
structures shows that galaxy distribution is inhomogeneous in these samples, being 
characterized by large- amplitude fluctuations of large spatial extension. Over a large 
range of scales, both the average conditional density and its variance show a nontrivial 
scaling behavior: at small scales, r < 20 Mpc/h, the average (conditional) density 
scales as r~^. At larger scales, the density depends only weakly (logarithmically) on 
the system size and density fluctuations follow the Gumbel distribution of extreme 
value statistics. These complex behaviors are different from what is expected 
in a homogeneous distribution with Gaussian fluctuations. The observed density 
inhomogeneities pose a fundamental challenge to the standard picture of cosmology 
but it also represent an important opportunity which points to new directions with 
respect to many cosmological puzzles. Indeed, the fact that matter distribution is not 
uniform, in the limited range of scales sampled by observations, rises the question of 
understanding how inhomogeneities affect the large-scale dynamics of the universe. We 
discuss several attempts which try to model inhomogeneities in cosmology, considering 
their effects with respect to the role and abundance of dark energy and dark matter. 
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1. Introduction 

Cosmological observations are usually interpreted within a theoretical framework based 
on the simplest conceivable class of solutions of Einstein's law of gravitation. Namely, 
by using the assumptions that the universe is homogeneous and isotropic, one is able 
to work out, from the Einstein's field equations, the dynamics of space-time. The 
Friedmann- Robert son- Walker (FRW) geometry is derived under these two assumptions 
and it describes the geometry of the universe in terms of a single function, the scale 
factor, which obeys to the Friedmann equations In this situation, the matter density 
is constant in a spatial hyper- surface. On the top of the constant matter density one 
can consider a statistically homogeneous and isotropic small-amplitude fluctuations field. 
These fluctuations furnish the seeds of gravitational clustering which will eventually give 
rise to the structures we observe in the present universe. Evolution of fluctuations is not 
considered to have a sensible effect on the evolution of the space-time which is instead 
driven by the uniform mean field. 

While the simplicity of this scenario has its own appeal, in the standard model 
of cosmology one has to conjecture the existence of two fundamental constituents, if 
observational constraints are met, that both have yet unknown origin: first, a dominant 
repulsive component is thought to exist that can be modeled, for instance, by a positive 
cosmological constant. The physical nature of this component, named Dark Energy, is 
yet unknown and its abundance cannot be inferred from a-priori principles, whilst it 
is widely believed that dark energy is the biggest puzzle in standard cosmology today; 
e.g., the value of the cosmological constant in cosmology seems absurdly small in the 
context of quantum physics p]. 

There is, secondly, a non-baryonic component that should considerably exceed the 
contribution by luminous and dark baryons and massive neutrinos. This component, 
named non-baryonic Dark Matter, is thought to be provided by exotic forms of matter, 
not yet detected in laboratory experiments. The main peculiar property of this matter 
component is that of weakly interacting with radiation in order to met the observational 
constraints given by observations [3]. According to the concordance model of standard 
cosmology, the contribution of the former converges to about 3/4 and that for the latter 
to about 1/4 of the total source of the standard cosmological equations (Friedmann 
equations), while up to a few percent has to be attributed to what is instead directly 
measurable by observations, namely ordinary baryonic matter, radiation and neutrinos. 

If the underlying cosmological model is not a perturbation of an exact flat FRW 
solution, the conventional data analysis and their interpretation is not necessarily valid 
and thus the estimations of Dark Matter and Dark Energy can be questionable. The 
breaking of the FRW solution can be caused, for instance, by strong inhomogeneities of 
large spatial extension in the matter distribution. If this were the case, the theoretical 
problem would then concern of whether inhomogeneous properties of the Universe can 
be described by the strong FRW idealization and/or in which limit this would be so. 

The question of whether observations of galaxy structures satisfy, on some scales. 
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the assumptions used to derive the FRW metric is thus central. Surprisingly enough, 
studies of the large scale distribution of matter in the universe, as sampled by galaxy 
structures, seem not to be trivially compatible with such a theoretical scenario. Indeed, 
more and more structures on larger and larger scales have been discovered in the course 
of the last two decades with the advent of the three dimensional maps of the large scale 
universe. There structures were unexpected both because two-dimensional (angular) 
surveys were rather uniform and because theoretical models were unable to predict the 
existence of them. In many cases it was concluded that the particular three-dimensional 
survey under consideration had picked up a particularly "rare" fluctuation: this was in 
respect to the Gaussian distribution of fluctuations predicted by theoretical models. The 
statistical characterization of these structures, determining the range of correlations and 
the amplitudes and sizes of inhomogeneities, has thus posed a fundamental challenge 
to the standard picture of cosmology. The key-problem would be then to include these 
large fluctuations in the cosmological dynamics in a coherent way. 

As long as structures are limited to small sizes, and fluctuations have low amplitude, 
one can just treat fluctuations as small-amplitude perturbations to the leading order 
FRW approximation. However if structures have "large enough" sizes and "high enough" 
amplitudes, a perturbation approach may loose its validity and a more general treatment 
of inhomogeneities needs to be developed. From the theoretical point of view, it is then 
necessary to understand how to treat inhomogeneities in the framework of General 
Relativity. In this context the first issue is whether inhomogeneities can be described 
by the FRW idealization at least on average, by postulating that on large enough 
scales uniformity is eventually reached. In other words, the key-question is: does an 
inhomogeneous model of the Universe at relatively small scales and, uniform at large 
scales, evolves on average like a homogeneous solution of Einstein's law of gravitation 
? Currently there is a wide discussion in the literature on this issue because, in the 
framework of averaged cosmological equations that has been provided by Buchert [1] , it 
was found that a potential way to explain Dark Energy (and possibly also Dark Matter) 
can be, partially or fully, given by an effect of structure formation in an inhomogeneous 
cosmology. Inhomogeneities may mimic the effect of Dark Energy [5j . 

Thus, while observations of galaxy structures have given an impulse to the search 
for more general solution of Einstein's equations than the Friedmann one, it is now 
under an intense investigation whether such a more general framework may provide a 
different explanation to the various effects that, within the standard FRW model, have 
been interpreted as Dark Energy and Dark Matter. 

In these proceedings we first review in Sectl2]the situation with respect to galaxy 
structures: their observations and the analysis of their statistical properties. We then 
discuss in Sect J3] the implications of the results on galaxy distribution with respect to 
the standard theoretical assumptions of the FRW model. This allows us to properly 
frame the problem of inhomogeneities from the point of view of theoretical modeling. 
We then draw, in SectJH our main conclusions. 
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2. Size and amplitude of galaxy structures 

In one of his seminal papers, Gerard de Vaucouleurs [6j put into an historical perspective 
the problem of galaxy large scale structures and the question about the scale where 
galaxy distribution turns to uniformity (where by uniformity it is meant the absence 
of structures of large amplitude or, in other words, of the absence voids — see below) . 
He pointed out that observations have firstly found that galaxies are not randomly 
distributed. Then, that in the fifties the same property was assigned to cluster centers. 
Finally that at the end of the sixties the discovery of super-clusters has still enlarged 
the scale of structures in the universe thus pushing to larger and larger scales the scale 
where the approach to uniformity occurs. In the last twenty years many observations 
have been dedicated to the study of the three-dimensionalil large-scale distributions of 
galaxies [71 [HI El [ini E] ■ In particular during the last decade two ambitious observational 
programs have measured the redshift of more than one million objects [121 US]- AH 
these surveys have detected larger and larger structures, thus finding that galaxies are 
organized in a complex network of clusters, super-clusters, filaments and voids. 

For instance the famous "slice of the Universe", that represented the first set of 
observations done for the CfA Redshift Survey in 1985 [H], mapped spectroscopic 
observations of about 1100 galaxies in a strip on the sky 6 degrees wide and about 
130 degrees long. This initial map was quite surprising, showing that the distribution 
of galaxies in space was anything but random, with galaxies actually appearing to be 
distributed on surfaces, almost bubble like, surrounding large empty regions, or "voids.". 
The structure running all the way across the survey between 50 and 100 Mpc/lfl was 
called the "Great Wall" and at the time of the discovery was the largest single structure 
detected in any redshift survey. Its dimensions, limited only by the sample size, are 
about 200 X 80 X 10 Mpc/h, a sort of like a giant quilt of galaxies across the sky [15] . 
More and more galaxy large scale structures were identified in the other redshift surveys 
such as the Perseus-Pisces super-cluster p] which is one of two dominant concentrations 
of galaxies in the nearby universe. This long chain of galaxies lies next to the the so- 
called Taurus void, which is a large circular void bounded by walls of galaxies on either 
side of it. The void has a diameter of about 30 Mpc/h. Few years ago, in the larger 
sample provided by the Sloan Digital Sky Survey (SDSS) [I3], it has been discovered the 
Sloan Great Wall [16], which is a giant wall of galaxies and which is the largest known 
structure in the Universe, being nearly three times longer than the Great Wall. 

if This is achieved by measuring the redshift z of a galaxy in addition to its angular coordinates. The 
Hubble's law linearly relate a galaxy redshift to its distance R = c/Hqz where c is the light speed and 
Hq, the Hubble constant, is an observationally determined parameter. 

§ We use Hq = lOOh km/sec/Mpc for the value of the Hubble constant; ft, is a parameter constrained 
by observations to be in range [0.5, 075]. Note that 1 Mpc « 3 • 10^"' cm and the size of the universe is 
thought to be 4000 Mpc/h 
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2.1. A characteristic scale for galaxy clustering ? 

Despite the fact that large scale galaxy structures, of size of the order of several 
hundreds of Mpc/h, have been observed to be the typical feature of the distribution 
of visible matter in the local universe, the statistical analysis measuring their properties 
has identified a characteristic scale which has only slightly changed since its discovery 
fourthy years ago in angular catalogs. Surprisingly enough in these samples, where only 
the angular coordinates of galaxies were measured, it was not evident at all the complex 
network of structures subsequently discovered with the advent of redshift surveys. 

Indeed, the characteristic scale tq, defined to be the one at which fluctuations in 
the galaxy density field are about twice the value of the sample density, was measured 
to be ro ~ 5 Mpc/h in the Shane and Wirtanen angular catalog [17j. More recent 
measurements of this scale [iHl [191 EDI EH 1221 ESI El EH] in the three dimensional 
catalogs found that tq fluctuates in the range rg ~ 5 — 15 Mpc/h (JJ]. This variation was 
then ascribed to a luminosity dependent effect — see e.g. [19l EHl EB El]- The small 
value of the scale tq seems not to characterize the spatial extension of structures, which 
can be even two orders of magnitude larger. Indeed, tq is a scale related to a specific 
value of the amplitude of density fluctuations relative to the average density and not to 
their spatial extension. 

To simply understand this situation we recall some elementary concepts |26j . 
For stationary density fields the quantity {p{fi)p{f2)) is called the complete 2-point 
correlation function (where (...) is to mean the ensemble average). If the field is 
statistically homogeneous and isotropic then this function depends on the scalar distance 
between points ri2 = |^i — ^2|- For a spatially uniform density field, for which the 
ensemble average density is po > 0, it is useful to consider the reduced correlation 
function 

C2{ri2) = ((p(ri) - Po)(p(r2) - Po)) • (1) 

This is the main function used to study spatial correlations between fluctuations from 
the average. The dimensionless two-point correlation function usually considered in the 
analysis of galaxy distributions is defined as 

Pi - Pi 

Note that this well-defined only when po > 0. This function is simply related to the 
normalized mass variance in a volume V^(-R) of linear size R 

, {M{Rf) - {M{R)y 

^ = {mW 

by the following relation [26] 

^\R) = T7^,L d'n I d'r^aru). (4) 
V^{R) Jv{R) Jv{R) 

II However, depending on how the sample density is estimated, also much larger values can be obtained 

EH [32] 



The complex universe: recent observations and theoretical challenges 



6 



For a spatially uniform density field, the scale r^, at which fluctuations are of the order 
of mean, i.e. 

cT{n) = 1 , (5) 

is proportional to the scale rg at which ^(rg) = 1. 

Consider now the case in which correlations have a finite range so that 

^(r) = Aexp{—r /rc) (6) 

where r^ is the correlation length of the distribution and A is a constant. Structures 
of fluctuations have a size determined by re- This length scale does not depend on the 
amplitude A of the correlation function but only on its rate of decay. On the other hand 
the scale at which ^(rg) = 1 is 

ro = rc ■ log(A) , (7) 

and it depends on the amplitude A of the correlation function. Thus the two scales r^ 
and Tc have a completely distinct meaning: the former marks the crossover from large to 
small fluctuation while the latter quantifies the typical size of clusters of small amplitude 
fluctuations. When ^ (r) is a power-law function of separation (with an exponent in 
the range < 7 < 3) then the correlation length Tc is infinite and there are clusters of 
all sizes [26] . 

In a finite sample, to give a physical meaning to ro or to r* one needs to verify 
that the average density is a well-defined concept. Indeed, if the ensemble density is 
asymptotically zero the normalization of the amplitudes of correlations to this value 
is not possible. Nevertheless, in such a situation, in a finite sample, one estimates 
the average density to be positive, with its precise value depending on the sample 
size [26]. This estimation is however biased by the finite size of the sample. Thus, 
prior to the analysis of fluctuations with statistical quantities like those defined in Eqj2] 
and Eq|3] it is necessary to carefully investigate whether the average density is stable 
"enough" in the available samples. Note that there has been an intense debate in the 
last decade concerning the statistical methods employed to characterize galaxy structures 
[271[2Hl[29l[26l|30l[3ll[32l|32l[33l|35 This was originated when it was reahzed [35l [36] 
that the normalization of two-point correlations to the sample average, as usually done 
in the field, can be problematic. 

We thus face two distinct questions: What is the typical size of structures ? What 
is the amplitude of fluctuations at a given scale ? The extension of structures can be 
statistically measured by the range of positive correlations. The characterization of 
the amplitude of fluctuations can be instead achieved by study their (conditional) PDF 

% Note that we are discussing the case in which the distribution is spatially uniform. This means 
that small amplitude fluctuations have power-law correlations. On the other hand, the estimator of 
the function ^(r) can display power-law correlation even when the distribution itself presents scaling 
behavior as in a fractal |26]. In such a situation, however, the (conditional) density itself presents 
scaling behavior, and the ^(r) analysis does not provide a statistically meaningful information (see 
discussion below). 
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as we discuss below. The key-question we face concerns the quantification of absolute 
fiuctuations and not of relative ones normalized to a sample estimation of the average 
density, the distinction between these two cases becomes especially relevant when the 
distribution is dominated by a few structures as in such a situation the concept of 
average density may loose its statistical meaning |26] . 

A simple way to perform this measurement consists in the comparison of galaxy 
counts in different angular regions El- Many authors found that there are fluctuations 
of the order of ~ 30% on scales of the order of 200 Mpc in a number of different three 
dimensional and angular catalogs [381 ESI HOl HH |l2l HS] implying that there is more 
excess large-scale power than detected by the standard correlation function analysis 
[33| 131] . As we discuss below, large scale structures and wide fluctuations at scales of 
the order of 100 Mpc/h or more are at odds both the small value of the characteristic 
length scale ro and with the predictions of the concordance model of galaxy formation 

[391 sol HI]. 

To clarify this puzzling situation, i.e. the coexistence of the small typical length 
scales measured by the two-point correlation function analysis with the large fluctuations 
in the galaxy density fleld on large scales as measured by the simple galaxy counts, one 
needs to consider in detail the assumptions and the limits of a statistical analysis by 
which the quantitative characterization of structures is performed. Before entering in 
such a discussion let us briefly review the main predictions of structure formation models 
in the standard cosmological scenarios. 

2.2. Predictions from structure formation models 

In structure formation models gravitational clustering drives non-linear structures 
formation from an initially uniform density fleld. Due to the small initial velocity 
dispersion, structure formation occurs in a bottom-up manner and thus fluctuations 
remain of small amplitude at large enough scales while they acquire, as time evolves, 
a large relative amplitude on some small scales. In this situation, given that the large- 
scale uniformity is preserved, the average density is a well deflned concept at all times 
and the length-scale ro does identify the typical size of non-linear structures. This scale 
represents one of the main predictions of theoretical models which must be confronted 
with observations. 

Theoretical models of galaxy formation, like the Cold Dark Matter (CDM) one 
— see e.g. [1^ — , are able to predict the scale rg once it is given the amplitude 
and correlation properties of the density fleld fluctuations in the early universe. Note 
that in a CDM model the main dynamical role is played by non-baryonic dark matter 
and the addition of dark-energy modifles mainly the global dynamical properties of the 
cosmology. In other words, as mentioned in the introduction, a CDM universe with 
a cosmological constant (i.e., a ACDM model) is made of about 1/4 of non-baryonic 

+ For this test one may consider counts as a function of distance and or of apparent luminosity 
(magnitude) : they both may quantify fluctuations in different sky regions [211 [33 
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dark matter and 3/4 of dark energy, with a small fraction of ordinary baryonic matter. 
The theoretical motivation for these dark substances, as mentioned in the introduction, 
finds its roots in the needs for an exotic matter which weakly interacts with radiation 
and in the repulsive substance required to satisfy by constraints imposed by a series of 
cosmological observations |15]. 

Indeed, in these models, density fluctuations in the early universe are coupled with 
radiation. Therefore one may obtain the normalization of the initial conditions by 
measuring the amplitude and correlation properties of the anisotropics of the Cosmic 
Microwave Background Radiation (CMBR). Then by calculating the evolution of small 
density fluctuations in the linear perturbation analysis of a self-gravitating fluid in an 
expanding universe, it is possible to predict the scale tq today. This turns out, in 
current models as the ACDM ones, to be tq ~ 5 Mpc/h |46]. On scales r < tq models 
are unable to make precise predictions on the shape of the correlation function because 
gravitational clustering in the non-linear regime is difficult to be treated. Gravitational 
N-body simulations are then used to investigate structure formation in the non-linear 
phase. 

In addition models predict that, for r > ro, a precise type of small amphtude 
fluctuations. It is thus possible to simply relate, for r > tq, by using the linear 
perturbation analysis mentioned above, the properties of fluctuations in the present 
matter density field to those in the early universe. CDM models predict that for 
Tq < r < Tc, fiuctuations have very small amplitude and weak positive correlations. 
The situation in this range of scales is well-approximated by EqE] (see |37] for details). 
The length-scale thus represents the cut-off in the size of weak amplitude (positively 
correlated) structures in standard models. In addition, for r > Tc Q models predict that 
the matter density field presents a specific type of anti- correlations [IS]. In particular, 
in these models correlations and anti-correlations are finely balanced in such a way that 



which is a global constraint on the behavior of the two-point correlation function 
^(r) corresponding to the super- homogeneous properties of cosmological density fields. 
In brief, this is global condition on the correlation properties of the matter density 
field, which can be understood as a consistency constraint in the framework of FRW 
cosmology, and it corresponds to a very fine tuned balance between negative and positive 
correlations of density fiuctuations and to the fastest possible decay of the normalized 
mass variance on large scales (see |1HI EHl HH] for a discussion on this topic). 

The fundamental tests for current models of galaxy formation then concern: (i) 
whether density fiuctuations at large scales (i.e. r > 10 Mpc/h) have small amplitude 
or not and (ii) whether there are anti-correlations on scales r > ~ 100 Mpc/h |47j . 
The primary problem to be considered in this respect concerns the statistical methods 
used to measure the amplitude of fiuctuations and the range of correlations. 

* The scale Tc estimated from CMBR measurements to be Tc « 100 Mpc/h [17] . 




(8) 
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2.3. Spatial homogeneity and self-averaging properties 

The problem of the statistical characterization these structures in a finite sample, of 
volume V containing M galaxies, can be rephrased as the problem of measuring volume 
averaged statistical quantities. The basic issue concerns whether these are meaningful 
descriptors, i.e. whether they give or not stable statistical estimations of ensemble 
averaged quantities [31]. This problem is particularly important when only a few large 
scale structures are present in a sample, a situation that occurs when correlations are 
long-ranged. 

In general it is assumed that galaxy distribution is an ergodic stationary stochastic 
process [26], which means that it is statistically translationally and rotationally 
invariant, thus avoiding special points or directions. Stationary stochastic distributions 
satisfy these conditions also when they have zero asymptotic average density in the 
infinite volume limit [26]. The assumption of ergodicity implies that in a single 
realization of the microscopic number density field n{r) the average density uq in the 
infinite volume is well defined and equal to the ensemble average density [26]. The 
constant Uq is strictly positive for homogeneous distributions and can be asymptotically 
zero for infinite inhomogeneous ones [26]. The infinite volume limit must be considered 
in the definition of probabilistic properties, but in any real samples, one is concerned 
only with finite volumes and statistical determinations. 

In inhomogeneous systems, like fractals, unconditional quantities are not well 
defined as these distributions are characterized by having a (conditional) average density 
which scales with the sample size and tends to zero as a power-law [26]. In this 
situation only conditional quantities can be well-defined from a statistical point of view. 
In addition, by studying conditional properties one is able to make reliable tests to 
determine whether a distribution is spatially uniform. 

The simplest one is the conditional density [26] : its ensemble average can be defined 

as 



It measures the density of points at distance r from a point of the distribution. For 
a fractal object one has {n{r))p oc r^~^, where < D < 3, so that it tends to zero 
in the infinite volume limit making the definition of the correlation function in Eqj2] 
meaningless. The statistical estimator of this latter statistics, in a finite sample, can be 
written as 



where us = {n{Rs))p is the estimation of the average density at the scale Rs of the 
sample itself. It is clear when {n{r))p has a power-law behavior, ^(r) is dependent on 
the sample size, resulting in an intrinsic bias of this function [26] . 




Given that the finite sample estimation of the average mass density is positive 
(unless the sample is empty), for inhomogeneous distributions the relative error with 




(9) 




(10) 
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respect to the ensemble value (i.e., zero) can be arbitrarily large [25]. This situation 
occurs as long as the sample size is smaller than the scale Aq at which the distribution 
eventually turns to homogeneity, i.e. beyond which density fluctuations are small and 
the conditional density becomes fiat 

{n{r))p ^ const. , 

so that D = 3 for r > Aq [26] . In the finite sample analysis it is then necessary to study 
the conditional scaling properties of statistical quantities, by an analysis of fluctuations 
and correlations which explicitly considers whether a distribution can be or not spatially 
homogeneous. 

There is however an additional problem to be carefully considered when dealing 
with inhomogeneous distributions in finite samples. Indeed, one has to test whether local 
sample fluctuations allow the determination of average conditional quantities [3211311150] . 
Indeed, statistical properties are determined by making averages over the whole sample 
volume [26]. In doing so one implicitly assumes that a certain quantity measured in 
different regions of the sample is statistically stable, i.e., that fluctuations in different 
sub-regions are described by the same PDF. However it may happen that measurements 
in different sub-regions show systematic (i.e., not statistical) differences, which depend, 
for instance, on the spatial position of the specific sub-regions. In this case the considered 
statistic is not statistically stationary in space, the fluctuations systematically differ in 
different sub-regions and whole-sample average values are not meaningful descriptors 
[26l l3T] . For a stationary stochastic point process this situations corresponds to the 
lack of self-averaging properties [31]. On the other hand a systematic dependence of 
the PDF on the specific position of the sample volume may correspond to the lack 
of stationarity (i.e., lack of statistical translational invariance) of the distribution, a 
situation that occurs, for instance, when there is a center breaking overall translational 
invariance [50] . 

In order to define a quantitative test for self-averaging, let us remind that a crucial 
assumption usually used is that stochastic fields are required to satisfy spatial ergodicity. 
Let us take a generic observable 

T = T{p{f^),p{f2),...) 

function of the mass distribution p(r) at different points in space ri,r2, •••• Ergodicity 
implies that 

{T)=T=\imTv, (11) 

V^oo 

where J-'v is the spatial average in a finite volume V [26]. When considering a finite 
sample realization of a stochastic process, and thus statistical estimators of asymptotic 
quantities, the first question to be sorted out concerns whether a certain observable is 
self- averaging in a given finite volume [511 |3T]. In general a stochastic variable J-' is self- 
averaging if = (J^) (see [31] for a more detailed discussion). Thus if this is ergodic, 
J-' = (J-"), then it is also self-averaging as J-" = finite sample spatial averages must 
be self-averaging in order to satisfy spatial ergodicity. 
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A simple test to determine whether a distribution is stationary and self- averaging 
in a given sample of linear size L consists in studying the probability density function 
(PDF) of conditional quantities Q (which contains, in principle, all information about 
moments of any order) in sub-samples of linear size L' < L placed in different and non- 
overlapping spatial regions of the sample (i.e., S'l, 5*2, ...S'at). That the self-averaging 
property holds is shown by the fact that P{Q, L!\ Si) is the same, modulo statistical 
fluctuations, in the different sub-samples, i.e., 

Pig,L';S,)^Pig,L';S,)\/ty^j . (12) 

On the other hand, if determinations of P{g, L'; Si) in different sample regions Si show 
systematic differences, then there are two different possibilities: (i) the lack of the 
property of stationarity or (ii) the breaking of the property of self- averaging due to a 
finite-size effect related to the presence of long-range correlated fluctuations. Therefore 
while the breaking of statistical homogeneity and/or isotropy imply the lack of self- 
averaging property the reverse is not true. However, if the determinations of the spatial 
averages give sample-dependent results, this implies that those statistical quantities do 
not represent the asymptotic properties of the given distribution [31]. 

As mentioned above, to test statistical and spatial homogeneity it is necessary to 
employ statistical quantities that do not require the assumption of spatial homogeneity 
inside the sample and thus avoid the normalization of fluctuations to the estimation of 
the sample average [3l]. We therefore consider the statistical properties of the stochastic 
variable defined by number of points Ni{r) contained in a sphere of radius r centered on 
the z*'^ point. This depends on the scale r and on the spatial position of the i^^ sphere's 
center, namely its radial distance Ri from a given origin and its angular coordinates cJj. 
Integrating over di for fixed radial distance Ri, we obtain that Ni{r) = N{r; Ri) [3T||32]. 

2.4- Results in galaxy catalogs 

The analysis of Ni{r] R) is found to be very efficient in mapping large scale structures 
which manifest themselves as large fluctuations in the Ni{r; R) distributions for different 
positions i and spheres radii r. For instance by studying this random variable in various 
three-dimensional slices of the SDSS samples we identify a giant filament covering, in the 
largest contiguous volume of the survey, more than 400 Mpc/h at a distance R ~ 500 
Mpc/h from us. In different sub-samples this analysis reveals a variety of structures, 
showing that large density fluctuations are quite typical. An example is shown in Figj2] 
which displays the behavior of N{r] Ri) in three different regions of a sample extracted 
from the SDSS. 

One may note that this analysis is more powerful in tracing large scale galaxy 
structures than the simple counting as a function of radial distance. Indeed, one may 
precisely describe the sequence of structures and voids characterizing the samples and, 
by changing the sphere radius r, one may determine the situation at different spatial 
resolutions. For instance the distribution in a certain region (see the bottom panel of 
Figj2]) is dominated by a single large scale structure, which is known as the SDSS Great 
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Figure 1. Projection on ttie X — Z plane of a SDSS sample in which the Sloan Great 
Wall appears as a long filament of galaxies. (From ^31,). 



Wall [16j. (In FigJUwe show the projection on the x — z plane of a sample where the 
SDSS Great Wall is placed in the middle). In some regions, which cover a small enough 
sky area, one is a able to well isolate structures at different distances, while the largest 
contiguous region, which covers a solid angle about six times larger than the other two 
sky areas, the signal is determined by the superposition of many structures of different 
amplitude and at different scales. By the simple visual comparison of the profile in the 
different regions we can conclude that, although the Great Wall is a particularly long 
filament of galaxies, it represents a typical persistent fluctuation. 

More information about structures amplitude and location is provided by the full 
Ni{r) = N{r; Xi,yi, Zi) data, where {xi,yi,Zi) are the Cartesian coordinates of the i*'* 
center. In order to illustrate this point, we have chosen in Figj3] a three dimensional 
representation where on the bottom plane we use the x, z Cartesian coordinates of the 
sphere center and on the vertical axis we display the intensity of the structures, i.e. the 
conditional number of galaxies contained in the sphere of radius r. (On the y direction 
the thickness of the sample is small, i.e. Ay ^ 15 Mpc/h). One may note that that the 
SDSS Great Wall is clearly visible as a coherent structure similar to a mountain chain, 
extending all over the sample. It is worth noticing that proflles similar to those shown 
in Figj2]and Fig O] have been found also in the 2dFGRS [331 [3l] supporting the fact that 
these fluctuations are quite typical of galaxy distribution (see Figj4]). 

We now pass to the determination of the PDF of the conditional galaxy counts in 



The complex universe: recent observations and theoretical challenges 



13 




R (Mpc/h) 

Figure 2. Behavior of N{r;Ri) in a SDSS sample and in the three different regions 
for r = 10 Mpc/h (Rl top, R2 Middle and R3 bottom). (From [3T]). 

spheres P{N;r), at different resolution r, separately in two independent regions of a 
given sample placed at different radial distances. In a first case (left panels of Figj5]), 
at small scales (r = 10 Mpc/h), the distribution is self-averaging both in the earlier 
data release of the SDSS (DR6 sample [52], that covers a solid angle floRe = 0.94 sr.) 
than in the sample extracted from the final data release (DR7 [53] with flom = 1-85 
sr. 2 X Qdr6 sr). Indeed, the PDF is statistically the same in the two sub-samples 
considered. Instead, for larger sphere radii i.e., r = 80 Mpc/h, (right panels of FigIS]) in 
the DR6 sample, the two PDF show clearly a systematic difference. Not only the peaks 
do not coincide, but the overall shape of the PDF is not smooth and different. On the 
other hand, for the sample extracted from DR7, the two determinations of the PDF are 
in very good agreement. We conclude therefore that, in DR6 for r = 80 Mpc/h there 
are large density fluctuations which are not self-averaging because of the limited sample 
volume [31] . They are instead self- averaging in DR7 because the volume is increased by 
a factor two. 

The lack of self-averaging properties at large scales in the DR6 sample is due to 
the presence of large scale galaxy structures which correspond to density fluctuations 
of large amplitude and large spatial extension, whose size is limited only by the sample 
boundaries. The appearance of self-averaging properties in the larger DR7 sample 
volumes is the unambiguous proof that the lack of them is induced by finite-size effects 
due to long-range correlated fluctuations [50]. The lack of self-averaging does not allow 
one to characterize the nature of fluctuations; this is however a clear indication that the 
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Figure 3. Three dimensional representation of the SL analysis with r = 10 Mpc/h 
for R3VL2. The x, z coordinates of the sphere center define the bottom plane and on 
the vertical axis we display the intensity of the structures, the conditional number 
of galaxies Ni{r) contained in the sphere of radius r. The SDSS Great Wall is 
clearly visible as a coherent structure of large amplitude, similar to a mountain chain, 
extending all over the sample. (From 31 ). 



distribution has not reached spatial uniformity. 

In the deepest sample we consider, which include mainly bright galaxies, the 
breaking of self-averaging properties does not occur as well for small r but it is found 
for large r. This can be due to the same effects i.e., that the sample volumes are still 
too small as even in DR7 for r = 120 Mpc/h we do not detect self-averaging properties 
(right panels of Fig|6l). Other radial distance-dependent selections, like galaxy evolution 
[54j , could in principle result in an effect in the same direction. Even in such a situation, 
our conclusion would be unchanged, i.e. on large enough scales self-averaging is broken. 
The reason why it is broken is then different: instead of the effect intrinsic fluctuations 
in the galaxy distribution, the result of a redshift-dependent effect. Because of these 
large fluctuations in the galaxy density field, self-averaging properties are well-defined 
only in a limited range of scales. Only in that range it will be statistically meaningful 
to measure whole-sample average quantities [31], [55] . 

Let us now pass to the determination of the first moment of the PDF, namely the 
conditional average density within radius r defined as 




(13) 
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Figure 4. Analysis similar to that shown in Figl2]but for a sample of the 2dFGRS. 
On the X and Y axes the coordinate of the center of a sphere of radius r — 10 Mpc/h 
(centered on a galaxy) is reported and on the Z axis the number of galaxies inside it. 
The mean thickness of this slice is about 50 Mpc/h. Large fluctuations in the density 
field are located in the correspondence of large-scale structures. (From [51]). 



where n{r) is, as discussed above, "conditioned" on the presence of the central galaxy. 
Then, the simplest quantity to further characterize density fluctuations is the conditional 
variance, or mean square deviation at scale r, i.e. the second moment of the PDF, which 
is defined as 



cT^(r) 



var [n{r)] 



M{r) 



n'^i{r)-n{r) . 



(14) 



M{r) 

At small length scales (r < 20 Mpc/h) the conditional average density shows a 
scaling behavior with an exponent close to minus one (see FigJT]). This result is in 
agreement with the ones obtained by the same method in a number of different samples 
(see [321 EH |33l 131] and references therein). This scaling can be interpreted as a signature 
of fractality of the galaxy distribution in this range of scale. In addition, this implies 
that the distribution is not uniform at these scales, and thus the standard two-point 
correlation function is substantially biased. 

Then, the average conditional density (Eq JT3l) at larger scales (r > 20 Mpc/h ) 
shows a different r dependence, as can be seen in FigJTl Our best fit is 

0.0133 



n{r) 



logr 



(15) 
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Figure 5. PDF of the conditional galaxy counts in spheres, in the sample defined 
hy R e [125,400] Mpc/h and M e [20.5,22.2] in the DR6 (upper panels) and DR7 
(lower panels) data, for two different values of the sphere radii r = 10 Mpc/h and 
r — 80 Mpc/h. In each panel, the black line represents the full-sample PDF, the red 
line (green) the PDF measured in the half of the sample closer to (farther from) the 
origin (From [50]). 



that is the average density depends only weakly (logarithmically) on r. Alternatively, 
an almost indistinguishable power-law fit is provided by 

^0.011 X r-°-^\ (16) 

We thus find a change of slope in the conditional average density in terms of the radius 
r at about ~ 20 Mpc/h. At this point the decay of the density changes from an inverse 
linear decay to a slow logarithmic one. Moreover, the density n{r) does not saturate to 
a constant up to ~ 80 Mpc/h, i.e., up to the largest scales probed in this sample. Note 
that up to r = 80 Mpc/h the number of points M(r) is larger than 10^, so that the 
statistics is sufficiently robust. 

This result is in agreement with a study of the SDSS-DR4 samples [56], where, a 
similar change of slope was observed at about the same scale r ^ 20 Mpc/h, together 
with quite large fiuctuations. Indeed, some evidences were subsequently found to 
support that the galaxy distribution is still characterized by rather large fiuctuations 
up to 100 Mpc/h, making it incompatible with uniformity |33|, |3ll [32l |3T1 |57j. In the 
Luminous Red Galaxy (LRG) sample of SDSS, Hogg et al. [29] also found that the 
slope changes at ~ 20 Mpc/h but then they claim to detect a transition to uniformity 
at about 70 Mpc/h, We do not observe in a clear way such a transition in the samples 
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Figure 6. The same of Figl5]but for the sample defined by i? G [200, 600] Mpc/h and 
M e [21.6, 22.8] and for r = 20, 120 Mpc/h (From [50 ). 
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Figure 7. Conditional average density n(r) of galaxies as a function of radius. In 
the inset panel the same is shown in the full range of scales. Note the change of slope 
at « 20 Mpc/h from 1/r to l/r" "^ and also the lack of fiattening up to ~ 80 Mpc/h. 
(From [55]). 
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Figure 8. Variance of tlie conditional density ni{r) as a function of the radius. 
Conversely, the corresponding variance of a Poisson point process would display a 1 /r^ 
decay. (From [55]). 

for which self- averaging properties are satisfied. 

Our best fit for the variance cr^(r) of the conditional density (Eq JT4l) is (see Fig|8]) 

(T^{r) ^ 0.007 X . (17) 

Given the scaling behavior of the conditional density and variance, we conclude that 
galaxy structures are characterized by non-trivial correlations for scales up to r ^ 80 
Mpc/h. 

Let us now turn to the analysis of the PDF. It is well-known that away from 
criticality, when correlations are long-ranged and the correlation length diverges, 
any global (spatially averaged) observable of a macroscopic system has Gaussian 
fluctuations, in agreement with the central limit theorem (CLT). At criticality, 
however, the correlation length tends to infinity, and the CLT no longer applies. 
Indeed, fluctuations of global quantities in critical systems usually have non-Gaussian 
fluctuations. The type of fluctuations is characteristic to the universality class of the 
system's critical behavior [58] [59] . Generally when correlations are long-ranged long- 
tailed distributions are found. In this situation some moments of the distribution may 
diverge as there is a finite probability to find fluctuations faraway from the most probable 
value [62] . 

To fit experimental data, the (generalized) Gumbel PDF [55] has often been used, 
where a is a real parameter. For integer values of a, this distribution corresponds to the 
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Figure 9. One of the best fits is obtained for r — 20. The data is rescaled by the 
fitted parameters a and /?. The solid line corresponds to the parameter-less Gumbel 
distribution EqlJO] The inset depicts the same on log-linear scale. (From ,551). 

a-th maximal value of a random variable. The a = 1 case corresponds to the Gumbel 
distribution. Experimental examples for Gumbel or generalized Gumbel distributions 
include power consumption of a turbulent flow |63] , roughness of voltage fluctuations in 
a resistor (original Gumbel a = 1 case) [61], plasma density fluctuations in a tokamak 
[M] . orientation fluctuations in a liquid crystal [65], and other systems cited in [60]. The 
Gumbel distribution describing fluctuations of a global observable was first obtained 
analytically in [61] for the roughness fluctuations of 1// noise. Its relations to extreme 
value statistics have been clarified [661 EI], generalizations have appeared [68], and 
related finite size corrections have been understood [69] . 

In a recent paper Bramwell [60] conjectured that only three types of distributions 
appear to describe fluctuations of global observables at criticality. In particular, when 
the global observable depends logarithmically on the system size, the corresponding 
distribution should be a (generalized) Gumbel. For example the mean roughness of 
1/f signals depends on the logarithm of the observation time (system size), and the 
corresponding PDF is indeed the Gumbel distribution [61] . 

The Gumbel (also known as Fisher-Tippet-Gumbel) distribution is one of the three 
extreme value distribution [TOllTl]. It describes the distribution of the largest values of 
a random variable from a density function with faster than algebraic (say exponential) 
decay. When fluctuations are characterized by the Gumbel PDF the situation is in 
between the Gaussian case, where all moments are well-defined, and the case in which 
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the tail of the PDF scales as a power-law, i.e. the case in which several moments of 
the PDF may diverge corresponding to an is extremely wild fluctuation field. It is 
important to stress that models of galaxy formation predict Gaussian-like fluctuations 
on sufficiently large scales. The key-issue concerns indeed at which scales fluctuations 
show a Gaussian behavior, which is a closely related problem to the scale at which the 
distribution turns to spatial uniformity [311 ES] • 
The Gumbel distribution's PDF is given by 



Piy) = ^ exp 
With the scaling variable 



y — a / y — a 

exp 



/3 " V /3 



'1^ 



(10, 

the density function (Eq JTSj) simplifies to the parameter-free Gumbel 

P(x) = e""-^"' (20) 

with (cumulative) distribution . Note that this distribution corresponds to large 
extremes, while for low extreme values, x is used instead of —x in the Gumbel 
distribution. 

The mean and the standard deviation (variance) of the Gumbel distribution (EqJTSll 
are respectively 

^ = a + -f(3, a^ = {l37rf/6 (21) 

where 7 = 0.85772 ... is the Euler constant. For the scaled Gumbel (Eq j20|) the first 
two cumulants of Eqj2T] simplify to 7 and 7r^/6. 

To probe the whole distribution of the conditional density nj(r), we fitted the 
Gumbel distribution (EqJTS!) via its two parameters a and (3. One of our best fits is 
obtained for r = 20 Mpc/h, see Fig. [9l The data, moreover, convincingly collapses to the 
parameter- less Gumbel distribution (Eq l20|) for all values of r for 10 < r < 80 Mpc/h, 
with the use of the scaling variable x from EqUH] (see Figs. fTOlfTTj) . Note that for a 
Poisson point process (uncorrelated random points) the number N{r) (and consequently 
also the density) fiuctuations are distributed exactly according to a Poisson distribution, 
which in turn converges to a Gaussian distribution for large average number of points 
N{r) per sphere. In our samples, N[r) is always larger than 20 galaxies, where the 
Poisson and the Gaussian PDFs differ less than the uncertainty in our data. Note also 
that due to the central limit theorem, all homogeneous point distributions (not only the 
Poisson process) lead to Gaussian fiuctuations. Hence the appearance of the Gumbel 
distribution is a clear sign of inhomogeneity and large scale structures in our samples. 

The fitting parameters in Eq{T8] varied with the radius r approximately as 

0.007 ^ 0.035 , , 

^^^^ P^^- (22) 

although a logarithmic fit a ~ 0.0115/ logr cannot be excluded either. With the fitted 
values of a and /3 we recover the (directly measured) average conditional density of 
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Figure 10. Data curves of different r scaled together by fitting parameters a and (3 
for each curves. The solid line is the parameter-free Gumbel distribution Eq[20l (From 
[55]). 



galaxies through Eq. |2T1 On the other hand, we have a discrepancy when comparing 
the directly measured o"^ to that obtained from the Gumbel fits through Eq. |2TJ The 
reason for this discrepancy is that the uncertainty in the tail of the PDF P{n,r) is 
amplified when we directly calculate the second moment. 

Due to the scaling and data collapse we argued that the large scale galaxy 
distribution shows similarities with critical systems [55] . Here the galaxy density around 
each galaxy is analogous to a random variable describing a spatially averaged quantity 
in a volume. The average conditional galaxy density depends on the volume size (~ r^) 
only logarithmically n(r) ~ 1/ logr from EqJTSl According to the conjecture of Bramwell 
for critical systems [60], if a spatially averaged quantity depends only weakly (say 
logarithmically) on the system size, the distribution of this quantity follows the Gumbel 
distribution. This is indeed what we see in the galaxy data. Hence our two observations 
about the average density and the density distribution are compatible with the behavior 
of critical systems in statistical physics. We note that standard models of galaxy 
formation predict homogeneous mass distribution beyond ~ 10 Mpc/h [ITJ |32l EI] . To 
explain our findings about non-Gaussian fluctuations up to much larger scales presents 
a challenge for future theoretical galaxy formation models (see [171 |32l |3ll |33l |3ll [55] 
for more details). 
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Figure 11. The same as Fig.[TUl but on log-linear scale to emphasize the tails of the 
distribution. (From 155|). 

3. Assumptions and basic principles in cosmology 

As we noticed in the introduction, a widespread idea in cosmology is that the so-called 
concordance model of the universe combines two fundamental assumptions. The first is 
that the dynamics of space-time is determined by Einstein's field equations. The second 
is that the universe is homogeneous and isotropic. This hypothesis, usually called the 
Cosmological Principle, is though to be a generalization of the Copernican Principle 
that "the Earth is not in a central, specially favored position" [721 [73]. The FRW model 
is derived under these two assumptions and it describes the geometry of the universe in 
terms of a single function, the scale factor, which obeys to the Friedmann equation [Ij. 
There is a subtlety in the relation between the Copernican Principle (all observes are 
equivalent and there are no special points and directions) and the Cosmological Principle 
(the universe is homogeneous and isotropic). Indeed, the fact that the universe looks 
the same, at least in a statistical sense, in all directions and that all observers are alike 
does not imply spatial homogeneity of matter distribution. It is however this latter 
condition that allows us to treat, above a certain scale, the density field as a smooth 
function, a fundamental hypothesis used in the derivation of the FRW metric. Thus 
there are distributions which satisfy the Copernican Principle and which do not satisfy 
the Cosmological Principle [26]. These are statistically homogeneous and isotropic 
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distributions which are also spatially inhomogeneous % Therefore the Cosmological 
Principle represents a specific case, holding for spatially homogeneous distributions, of 
the Copernican Principle which is, instead, much more general. Statistical and spatial 
homogeneity refer to two different properties of a given density field. The problem of 
whether a fiuctuations field is compatible with the conditions of the absence of special 
points and direction can be reformulated in terms of the properties of the probability 
density functional (PDF) which generates the stochastic field. 

Matter distribution in cosmology then is considered to be a realization of a 
stationary stochastic point process. This is enough to satisfy the Copernican Principle 
i.e., that there are no special points or directions; however this does not imply spatial 
homogeneity. Spatially homogeneous stationary stochastic processes satisfy the special 
and stronger case of the Copernican Principle described by Cosmological Principle. 
Indeed, isotropy around each point together with the hypothesis that the matter 
distribution is a smooth function of position i.e., that this is analytical, implies spatial 
homogeneity. (A formal proof can be found in [75].) This is no longer the case for a 
non-analytic structure (i.e., not smooth), for which the obstacle to applying the FRW 
solutions has in fact solely to do with the lack of spatial homogeneity |77j . 

The condition of spatial homogeneity (uniformity) is satisfied if the ensemble average 
density of the field (p) is strictly positive. We discussed in the previous sections tests to 
establish whether a point distribution in a given sample (i.e. galaxies) is spatially 
uniform. The additional test provided by the analysis of the PDF of conditional 
fiuctuations in disjointed sub-samples, i.e. EqUJl allows us to determine a distribution 
is also statistically stationary. In the case in which systematic differences are found 
to be important one may then further test whether this is due to the breaking of self- 
averaging properties for the presence of large-scale fiuctuations or whether this is due to 
the lack of translational invariance. A statistical test able to distinguish between these 
two cases, can also give an information about the validity of the Copernican principle 
as we discuss in the next section rMj. 



3.1. Testing the Copernican and cosmological principles 

Let us firstly consider a case where translational invariance is broken. We generate 
a Poisson-Radial distribution (PRD) which is a inhomogeneous distribution that can 
mimic the effect of a "local hole" around the origin. In a sphere of radius Ro = 1 we 
place, for instance, N = 2 ■ 10^ points. In each bin at radial distance from the sphere 
center and with thickness AR, the distribution is Poissonian with a density 

varying as n[R) = uq ■ R , where uq is a constant. We determine the PDF P{N; r) of 

jj Mandelbrot f74lhas introduced a modified version of the Cosmological Principle, named Conditional 
Cosmological Principle, which is the analogous of the Copernican Principle discussed in the text, 
ff Note that the breaking of the condition of translational invariance may also occur in the presence 
of a redshift-dcpendent selection effect. Thus the the violation of the Copernican principle due to 
the intrinsic lack of statistical translational invariance can be concluded only if all redshift dependent 
selection effects are taken into account. 
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conditional fluctuations obtained by making an histogram of the values of N{r; R) at 
fixed r (see the upper panels of Fig JT2|) . The whole-sample PDF is clearly left-skewed: 
this occurs because the peak of the PDF corresponds to the most frequent counts which 
are at large radial distance simply because shells far-way from the origin contain more 
points. The spread of the PDF can easily be related to the difference in the density 
between small and large radial distances in the sample. By computing the PDF into two 
non-overlapping sub-samples, nearby to and faraway from the origin, one may clearly 
identify the systematic dependence of this quantity on the specific region where this 
is measured. This breaking of the self-averaging properties is caused by the radial- 
distance dependence of the density and thus by the breaking of translational invariance 
(as noticed above in the data a redshift dependent selection effect may cause in the same 
result). 

Let us now consider a stationary stochastic distribution, where the breaking self- 
averaging properties is due to the effect of large scale fluctuations. An example is 
represented by the inhomogeneous toy model (ITM) constructed as follows. We generate 
a stochastic point distribution by randomly placing, in a two-dimensional box of side 
L, structures represented by rectangular sticks. We first distribute randomly A^^ points 
which are the sticks centers: they are characterized by a mean distance A ^ (L^ /NgY^^. 
Then the orientation of each stick is chosen randomly. The points belonging to each stick 
are also placed randomly within the stick area, that for simplicity we take to be £ x i/10. 
The length-scale i can vary, for example being extracted from a given PDF. The number 
of sticks placed in the box fixes A. This distribution is by construction stationary i.e., 
there are no special points or directions. When i > L and A < L but with i varying in 
such a way that there can been large differences in its size, the resulting distribution is 
long-range correlated, spatially inhomogeneous and it can be not self-averaging. This 
latter case occurs when, by measuring the PDF of conditional fluctuations in different 
regions of a given sample, one finds, for large enough r, systematic differences in the 
PDF shape and peak location (see the bottom panels Fig JT2|) . These are due to the 
strong correlations extending well over the size of the sample. 

How can we distinguish between the case in which a distribution is not self-averaging 
because it is not statistically translational invariant and when instead this is stationary 
but fluctuations are too extended in space and have too large amplitude ? The clearest 
test is to change the scale r where P{N, r) is measured, and determining whether the 
PDF is self-averaging. Indeed, in the case of the PRD the strongest differences between 
the PDF measured in regions placed at small and large radial distance from the structure 
center, occur for small r. This is because the local density has the largest variations at 
small and large radial distances by construction. When r grows, different radial scales 
are mixed as the generic sphere of radius r pick up contributions both from points 
nearby the origin and from those far away from it, resulting in a smoothing of local 
differences. Instead, in the ITM for small r the difference is negligible while for large 
enough r the different determinations of the local density start to feel the presence of a 
few large structures which dominate the large scale distribution in the sample. 
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Figure 12. Upper panels: The PDF for r = 0.1 (left) and r = 0.3 (right) for PRD 
computed for the whole sample (black Ime). The red (green) line shows the PDF 
measured in the sub-sample placed closer to (father from) the origin. Lower Panels: 
The same for the ITM at scales r = 0.02 (left) and r = 0.1 (right) (From 50 . 



3.2. Implications for theoretical modeling 

The discussion in the previous sections was meant to treat the statistical properties 
of the galaxy density field in a spatial hyper-surface. As mentioned above, this is 
an approximation valid when considering the galaxy distribution limited to relatively 
low redshifts, i.e. z < 0.2. In particular, we have developed a test to focus on the 
properties of statistical and homogeneity homogeneity in nearby redshift surveys. The 
assumptions of the cosmological model enter in the data analysis when calculating the 
metric distance from the redshift and the absolute magnitude from the apparent one 
and the redshift. However, given that second order corrections are small for z < 0.2, our 
results are basically independent on the chosen underlying model to reconstruct metric 
distances and absolute magnitudes from direct observables. In practice we can use just 
a linear dependence of the metric distance on the redshift (which is, to a very good 
approximation, compatible with observations at low redshift). For this same reason we 
can approximate the observed galaxies as lying in a spatial hyper-surface. 

In the ideal case of having a very deep survey, up to 2; ~ 1, we should consider that 
we make observations on our past light-cone which is not a space-like surface. In order 
to evolve our observations onto a spatial surface we would need a cosmological model, 
which at such high redshift can play an important role in the whole determination of 
statistical quantities. A sensible question is whether we can to reformulate the statistical 
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test given so that it can be applied to data on our past light-cone, and not on an assumed 
spatial hyper-surface. Going to higher redshift poses a number of question, first of the 
all the one of checking the effect of the assumptions used to construct metric distances 
and absolute magnitudes from direct observables. Testing these effects can be simply 
achieved by using different distance-redshift relations. 

However, we note that a smooth change of the distance-redshift relation as implied 
by a given cosmological model, may change the average behavior of the conditional 
density as a function of redshift but it cannot smooth out fluctuations, i.e. it cannot 
substantially change the PDF of conditional fluctuations when they are measured locally. 
Indeed, our test is based on the characterization of the PDF of conditional fluctuations 
and not only of the behavior of the conditional average density as a function of distance. 
The PDF provides, in principle, with a complete characterization of the fluctuations 
statistical properties. We have shown that the PDF of fluctuations has a clear imprint 
when the distribution is spherically symmetric or when it is spatially inhomogeneous 
but statistically homogeneous. 

The fact that we analyze conditional fluctuations means that we consider only local 
properties of the fluctuations: local with respect to an observer placed at different radial 
(metric) distances from the us, i.e. at different redshifts. For the determination of the 
PDF we have to consider two different length scale: the first is the (average) metric 
distance R of the galaxies on which we center the sphere and the second is the sphere 
radius r. Irrespective of the value of R when r is smaller than a few hundreds Mpc 
(i.e., when its size is much smaller than any cosmological length scale), we can always 
locally neglect the specific R{z) relation induced by a specific cosmology. In other 
words, when the sphere radius is limited to a few hundreds Mpc we can approximate 
the measurements of the conditional density to be performed on a spatial hyper-surface. 

The whole description of the matter density field in terms of FRW or even Lemaitre- 
Tolman-Bondi (LTB) cosmologies, refer to the behavior of, for instance, the average 
matter density as a function of time (in the LTB case also as a function of scale) but 
it says anything on the fluctuation properties of the density field. Thus, when looking 
at different epochs in the evolution of the universe, we should detect that the average 
density varies (being higher in early epochs). This means only that the peak of the 
PDF will be located at different N values, but the shape of the PDF is unchanged by 
this overall (smooth) evolution. Fluctuations are simply not present in the FRW or 
LTB models, and the whole issue of back-reaction studies is to understand what is their 
effect. 

Note that models which explain dark energy through inhomogeneity do so using a 
spatial under-density in the matter density which varies on Gpc scales — out to ^ ~ 1 
[76] . These models by placing us at the center of the universe, violate the Copernican 
Principle. In this respect we note that, while we cannot make any claim for z > 0.2 
based on current data, the fact that galaxy distribution is spatially inhomogeneous 
but statistically homogeneous up to 100 Mpc/h, already poses intriguing theoretical 
problems. Indeed, in that in that range of scales, the modeling of the matter density field 
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as a perfect fluid, as required by the FRW models, is not even a rough approximation. 
As pointed out by various authors [271 ESIj if the linearity of the Hubble law is a 
consequence of spatial homogeneity, how is it that observations show that it is very well 
linear at the same scales where matter distribution is inhomogeneous ? Recently [78] it 
was speculated a solution to this apparent paradox can be found by considering both 
the effects of back-reaction and the synchronization of clocks. While this is certainly 
an interesting approach, the formulation of a more complete and detailed theoretical 
framework is still lacking. 

Finally we note that there are several complications in radially inhomogeneous 
models at high redshift. Beyond the change of the distance- redshift relation, discussed 
above, another is how structure evolves from our past light-cone onto a surface of 
constant time. Thus in order to make a precise test on the spatial properties of a 
given model, one needs to develop the corresponding theory of structure formation. 
However, at least at low redshifts, it seems implausible that the main feature of the 
model, the specific redshift-dependence of the spatial density, will not be the clearer 
prediction for the observations of galaxy structures. 

4. Conclusions 

We discussed several results showing that galaxy distribution in the newest galaxy 
samples is characterized by large fluctuations. These are manifest in the scaling 
properties of the conditional density which shows scaling behaviors. Particularly at small 
scales this statistics presents a power-law behavior with an exponent close to minus one, 
corresponding to a fractal dimension D ^ 2. The difference with the different dimension 
D = 1.2 reported by authors (see e.g. [HI [HI [20]) is due to the finite size effects which 
perturb the estimation of the two-point correlation function ^(r) [26j. 

On larger scales and up to r ^ 80 Mpc/h a smaller correlation exponent is found to 
fit the data: the density depends, for 20 < r < 80 Mpc/h, only weakly (logarithmically) 
on the system size. Correspondingly, we find that the density fluctuations follow 
the Gumbel distribution of extreme value statistics. This distribution is clearly 
distinguishable from a Gaussian distribution, which would arise for a homogeneous 
spatial galaxy configuration. While in the Gaussian case the rapid decay of the tails of 
the distribution cut large fluctuations, in the Gumbel case the large value tail decays 
slower. In such a situation the density field is still inhomogeneous, although not as wild 
as in the case in which the PDF presents power-law tails. 

We discussed that in several samples it is found that self-averaging properties, at 
large scales, are not satisfied. This is due to the presence of large scale galaxy structures 
which correspond to density fluctuations of large amplitude and large spatial extension, 
whose size is limited only by the sample boundaries. Note that the lack of self-averaging 
does not allow one to characterize the nature of fluctuations; this is however a clear 
indication that the distribution has not reached spatial uniformity. 

The large scale inhomogeneities detected in the three dimensional galaxy samples 
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are at odds with the predictions of standard models. In particular according to these 
models the density field should present on large scales sub-Poissonian fiuctuations, or 
a super-homogeneous nature with negative correlations [MIES]. Forthcoming redshift 
surveys will allow us to clarify whether on such large scales galaxy distribution is still 
inhomogeneous but statistically stationary, or whether the evidences for the breaking 
of spatial translational invariance found in the galaxy samples considered were due to 
selection effects in the data. 

Finally we discussed that the galaxy distribution is found to be compatible with 
the assumptions that this is transitionally invariant, i.e. it satisfies the requirement of 
the Copernican Principle that there are no spacial points or directions., while because 
of lack of spatial homogeneity galaxy distribution is not compatible with the stronger 
assumption of spatial homogeneity, encoded in the Cosmological Principle. 
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